Digital data processing apparatus

ABSTRACT

A modular data processing unit is adapted to form part of a processing system comprising a plurality of such processing units connected in master-slave relationship. Each processing unit comprises a control section responsive to a control instruction to perform the execution of a program stored therein and to switch the processing unit to an idle mode on completion of the program execution, an internal memory section including an area of read/write memory; a computation section for performing calculations in accordance with the program instructions during which it can access the internal memory; and interface means through which the processor unit can be connected as a master unit to one or more slave units to monitor and control, and to access the internal memories of the slave units while it is executing a program, and through which it can be connected as a slave to a master processing unit by which it can thus be monitored and controlled, and have its internal read/write memory accessed. In such a multi-processor system, a master unit may be connected to a number of slave units each programmed to carry out a particular function, and in turn each of these slave units may be connected as a master to its own set of slave units, again each programmed to perform a particular function; the whole system being coordinated and controlled by the original master unit.

This invention relates to digital data processing apparatus.

The invention is particularly through not exclusively concerned withdigital data processing apparatus suitable for special purpose signalprocessing applications in which the amount of computing power issubstantially more than can be obtained from a single bit-slice centralprocessing unit (CPU).

BACKGROUND OF THE INVENTION

It is accepted that considerable savings in the cost and size of highpower special purpose signal processors could be achieved if it werepossible to effectively use current highly integrated components such asbit-slice CPUs. The reason for this is that these more highly integratedcomponents are considerably cheaper per gate than the medium and largescale discrete integrated circuit combinations from which specialpurpose hardware signal processors are usually constructed, evenallowing a substantial factor for inefficiency in using those gates. Onthe other hand, a special purpose signal processor is almost inevitablymore efficient in dealing with signal processing problems than a generalpurpose computer, because the latter incurs speed penalties in theserial nature of its operations and fairly high hardware overheadsbecause of its generality (eg the need to be able to store and loadprograms).

However, difficulties do arise in the use of highly integratedcomponents such as bit-slice CPU devices in any application where theamount of computing power is substantially more than can be obtainedfrom a single CPU device. The main difficulty is in the definition of anarchitecture in which a number of such devices can cooperate to producethe necessary computing power without high overheads or complexoperating systems. There are also associated practical difficulties ofprogramming a complex multiple processor system and in maintaining thesystem when constructed.

It is an object of the present invention to provide means whereby atleast some of the above-mentioned disadvantages may be overcome or atleast substantially reduced.

SUMMARY OF THE INVENTION

According to the present invention, a data processing unit comprises:

a control section responsive to a control instruction to switch theprocessing unit from an idle mode (as hereinafter defined) to anexecution mode, the control section being operative during the executionmode to initiate and control the sequence of execution of a programstored therein and to automatically switch the processor unit to theidle mode when execution of the program is completed;

a memory section including an area of read/write memory;

a computation section for performing calculations in accordance withinstructions from the program and capable of accessing the read/writememory area during execution of the program;

and interface means through which the processing unit can access anexternal memory when in its execution mode, through which at least partof the read/write memory area of processor unit can be accessed when inits idle mode, and through which an instruction for switching theprocessing unit from its idle to its execution mode can be received.

Thus, data on which the processing unit is to perform its computationsin accordance with the program stored in its control section may beloaded into the read/write memory area from outside via the interfacemeans when the processing unit is idle prior to initiation by the startinstruction. In the course of execution of the program, this data may beread from the read/write memory area by the control section, andcalculations performed on it before the "answer" is written back intothe read/write memory area. In the course of executing the program, theprocessing unit may also access an external memory via its interfacemeans either to write data into it or to read data from it. When theprocessor unit has completed execution of the program, and is switchedto its idle mode, the internal read/write memory may then be accessedfrom outside, via the external bus system, eg to recover the "answer".Alternatively, the processing unit may write the data into an externalmemory before switching to its idle mode.

Because a processing unit in accordance with the invention can bothaccess an external memory when in its execution mode, and have itsread/write memory area accessed when in its idle mode, the interfaceunits of two such processing units may be connected together so that oneof them, when in its execution mode, can access the internal read/writememory area of the other when in its idle mode.

The interface means may conveniently comprise a master interface unitfor connecting the processing unit to an address bus through which theprocessing unit can receive address words designating locations in itsinternal read/write memory area, and to a data bus through which it canreceive or transmit data to be written into or read from the memorylocations designated by the address words; and a slave interface unitfor connecting the processing unit to an address bus through which theprocessing unit can transmit memory addresses to an external memory areaand to a data bus through which data to be written into or read from theexternal memory can be transmitted or received. Thus, the slaveinterface unit of a first processing unit may be connected to the masterinterface unit of a second processing unit by means of data and addressbuses in such a way that the first processing unit when in executionmode can access the internal read/write memory of the second processingunit when the latter is idle. Similarly the slave interface unit of thesecond processor unit may also be connected to the master interface unitof a third processor unit such that the second processor unit when inits execution mode can access the internal read/write memory of thethird processing unit when the latter is idle. Thus any number ofprocessing units may be connected together in a chain configuration.

Preferably, however, the external memory address capacity of theprocessing unit via its slave interface unit is substantially greaterthan the internal read/write memory address capacity that is accessiblefrom outside via its master interface unit, thereby enabling a number ofprocessing units to be connected to the slave interface unit of a singleprocessing unit.

Preferably a part of said externally accessible internal read/writememory area is accessible also from the control section and ispreferably provided by a separate memory device.

This part of the internal read/write memory area provides means wherebycontrol data for controlling the operation of the processor can bewritten into the processing unit from outside, and which can also beread by the control section to derive the start instruction and initiateexecution of a program. Furthermore, it enables the processing unit tobe directly controlled by another processing unit of the same kind viatheir respective interface units.

Thus, if the slave interface unit of a first processing unit isconnected to the master interface unit of a second processing unit, thefirst processing unit can access, for the purpose of writing controldata into the part of the internal read/write memory area of the secondprocessing unit which is accessible also to the control section thereof.The first processing unit can thus control the second processing unit,and similarly any other processing units whose master interface unitsare connected to its slave interface unit.

Preferably said part of the internal read write/memory area which isaccessible to the control section, is accessible both during the idleand the execution mode of the processing unit, and the control sectionincludes means for writing into it data indicating a change in theactivity status, idle or executing, of the processing unit. This enablesthe activity status of a first unit to be monitored by a secondprocessing unit which has access to the internal read/write memory area.Furthermore it enables the second processing unit to write new controldata into the first processing unit whether the latter is in its idle orexecution mode. This facility may be used to effect switching of thefirst processing unit between its idle and its execution modes eg toallow the second processing unit to access the read/write memory area ofthe first processing unit when the latter is in the middle of a programexecution.

It will be apparent that the said part of the internal memory area whichis accessible both to the control section and from outside via the(master) interface unit need only comprise a 1-bit memory word, onevalue (eg`0`) indicating the idle mode and the other (eg `1`) indicatingthe execution mode. This one bit may be written or read from outside forcontrol and monitor purposes respectively; or written or read frominside by the control section on the one hand to provide an indicationof the processor units activity status for external monitoring purposes,and on the other to enable the control section to respond to a change inthe bit value to switch between modes.

Preferably however, the control section is capable of controlling theexecution of a plurality of programs the sequential instructions ofwhich are stored in a programmable read only memory (PROM) each having adifferent starting address in the PROM. In this case, the part of theinternal read/write memory which is accessible both to the controlsection and from outside via the (master) interface unit will include,in addition to the above single status bit, address word bits into whichthe PROM start address for any one of the stored programs can be writtenfrom outside, and which can be read by the control section for programselection.

Although a processing unit in accordance with the invention can be usedon its own to perform data processing operations, and in particularsignal processing operations, the purpose of the invention is to providea modular processing unit which may be connected to cooperate with oneor more other processing units of the same kind in carrying out complexdata processing operations requiring computing power greater than thatwhich can be achieved using a single processor unit. In designing amultiple processor system for such applications using a number ofprocessing units in accordance with the invention, each processing unitmay be programmed to carry out a particular function, which function mayinclude the designation of parts of that function to one or more otherprocessing units and the control of those processing units in performingtheir particular functions. In addition, any of the processing units inthe system may be given access, by means of its (slave) interface means,to external memory or input/output devices for example, to provideresources not available from the combination of several processors. Theprocessing unit can access these other devices in the same way as itaccesses the internal memories of other processing units. In theforegoing and in the following description it is to be understood thatthe processing unit remains live while in the idle mode and is definedto be "idle" only by virtue of the fact that, in this mode, the controlsection is not employed in controlling the execution of a program storedtherein.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will now be described in greater detail, by way of exampleonly, with reference to the accompanying drawings, of which:

FIG. 1 is a block schematic diagram of a processor unit in accordancewith the present invention:

FIG. 2 is a block schematic diagram of part of the processor unit ofFIG. 1, and

FIG. 3 is a block schematic diagram of a processor system incorporatinga plurality of processor units of the kind shown in FIG. 1.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

Referring now to FIG. 1 of the drawings the processor unit is dividedfor convenience into three sections separated in the drawing by brokenlines. These sections comprise a control section 1, containing amicrocode memory in which one or more micro-programs which determine thefunction or functions of the processor unit are stored; a memory section2 containing an area of read/write memory for storing variable data andan area of programmable read only memory (PROM) in which fixed datatables, such as sine and cosine look-up tables, can be stored; and acomputation section 3 in which the mathematical data, memory address,and loop counting computations are performed in accordance withinstructions from the microprogram.

The processor unit is also provided with two interface arrangements forconnection to respective sets of data and address buses. The first ofthese interface arrangements, which will hereinafter be referred to asthe slave interface arrangement (44,45) can be used to connect theprocessing unit to one or more other processing units of the same kind,to enable it to access the memory sections of, and to monitor andcontrol the operation of those other (slave) processing units. Thisslave interface arrangement may also enable the processing unit to beconnected to external memories or I/O latches.

The second of the interface arrangements, which will hereinafter bereferred to as the master interface arrangement (42,43), enables theprocessing unit to be monitored and controlled, and have part of itsinternal memory section accessed from outside, eg from another (master)processor unit of the same kind. This is achieved by connecting themaster interface arrangement of the processing unit to the slaveinterface arrangement of this other (master) processing unit.

Thus, a number of processor units may be connected together inmaster/slave relationships to form a `tree` structure by interconnectingthe master interface arrangements of some of the processing units withthe slave interface arrangements of others as illustrated schematicallyin FIG. 3. As indicated earlier, a number of processing units may beconnected as slaves to a common master, but a slave processing unit willnormally be connected to only one master processing unit. Furthermore, aprocessing unit which is connected as a slave to a master processingunit may also be connected as a master to one or more slave units.

The data and address bus set which interconnects the slave interfacearrangement of a master processing unit with the master interfacearrangement of each of its slave processing units enables it to monitorthe operation of the slave to determine whether it is idle or whether itis executing a program, these being the only two modes of which aprocessing unit is capable. The same data and address bus set alsoenables the master processing unit to independently switch any one ofits slave processing units from its idle to its execute mode, ie tostart the slave unit, or to switch it from its execute to its idle mode,ie to stop the slave unit. When switching from idle to execute, themaster may also select which of the slave's microprograms is to beexecuted. When the slave is in its idle mode, typically before itcommences execution of a microprogramme or when it has completedexecution of a microprogram on a particular block of data, the mastermay access the slave's internal memory section, either to write datainto, or to read data from it. The data read from (or written into) theslave' s memory section may be transferred to (or from) the master's owninternal memory section, the internal memory section of another one ormore of its slaves, or from an external memory or I/O latch to which themaster's slave interface arrangement is connected.

A typical sequence of operation would begin with a master in executionmode and its slaves idle. In this situation the master may write datainto its slaves' memory sections, select appropriate programs within theslaves' microcode memories, and then switch them to execution mode. Themaster may continue in execution mode and then monitor the status of itsslaves until they have completed execution of their respective programson the data which was supplied to them, and again become idle. Themaster may then access the slaves' memory sections either to copy theresults of slaves' computations into its own memory section, or totransfer intermediate data from one slave to another. While the slavesare in execution mode, they may themselves become masters controllingtheir own slaves at the next level down in the tree structure.

The top processing unit of the tree structure will normally be connectedto a host minicomputer which will thus have overall control of the wholemultiple-processor system, and be able to supply input data to, and readoutput data from, the system via this top processing unit. Additionallyinput/output of data may be achieved by means of I/O latches connectedat appropriate locations within the tree structure.

Thus a processing unit may either be used on its own, or a complexmultiple processor system suitable for special purpose signal processingapplications may be constructed from a number of identical processingunits each adapted to perform a specific function or functionsdetermined by its particular microprogram(s).

A more detailed description of an individual processing unit inaccordance with the invention will now be given with reference to FIGS.1 and 2 of the accompanying drawings. For the purposes of thisdescription it will be assumed that the processing unit is connected asone of a number of slaves to one other processing unit, and is connectedas a master to a number of its own slave units.

The control section 1 of the processing unit is built around a microcodememory in the form of a programmable read only memory 5 (PROM) of 1K, 72bit wide micro-instruction words. Each micro-instruction word representsone step in a microprogram, of which up to 16 may be stored in the PROM,and is largely horizontally coded enabling the various parts of theprocessing unit to be operated in parallel as will be described below.

Under normal sequential program execution a 10-bit word latched in anaddress register 6 and designating the address of a micro-instructionword in the PROM 5 is incremented by 1 each cycle so that themicro-instruction words of successive PROM addresses are executed insequence. Incrementing of the 10-bit PROM address word is effected by anincrement counter 7 which, in successive cycles, adds 1 to the numericalvalue of the address word of the preceding cycle held in the addressregister 6, and feeds the new address word back into it.

The processing unit is however capable of non-sequential programexecution, in which a part of the micro-instruction word can be used toinstruct the processing either to `jump`, `call`, `return` or `stop`.The `jump` instruction overrides the increment counter 7 and causes anew 10-bit PROM address word, contained in the same 72-bitmicro-instruction word as the `jump` instruction itself, to be writteninto the address register 6 so that the program continues normalsequential execution from the new address (at least until a furthernon-sequential instruction appears). The program may thus be made to`jump` from and to any micro-instruction word stored in the PROM 5.

The `call` instruction is used in conjunction with the `return`instruction, and is similar to the `jump` instruction in that it causesthe program to jump to a new micro-instruction word designated by a10-bit PROM address word contained in the same micro-instruction word asthat which executes the call instruction. The `call` instruction,however, is always followed at some later stage in the program sequenceby a `return` instruction which causes the program to return to themicro-instruction word succeeding that which contained the original`call` instruction. Thus, when a `call` instruction is executed, the`return` address is taken from the increment counter 7 and is stored ina stack 8 (last-in-first-out buffer) which will allow the nesting ofreturn addresses up to 4 deep. Thus up to three additional `call` and`return` instructions may be executed between the first occuring `call`instruction and its associated `return` instruction.

Finally, the fourth program instruction which may be used to alter thenormal sequential operation of the program is the `stop` instructionwhich simply changes the operating mode or status of the processor unitfrom running to idle, following which its memory section 2 may beaccessed from outside as discussed earlier.

Excution of each of the four sequence-control instructions describedabove may be conditional upon the value of a 1-bit flag register 16, theconditionality being determined by one bit in the micro-instructionword.

The control section 1 of the processing unit also includes control logiccircutry 9, and a clock control unit 10.

The clock control unit 10 is controlled via a clock line 12, by theclock control unit of its master processor unit, and the clock controlunits of its slave units are controlled by the control unit 10 via clockline 13. The clock control unit 10 determines the cycle time of theprocessing unit via the control logic circuit 9.

The control logic circuitry 9 also performs several other functions. Itis used to override any one of the four sequence control instructionswhen such instructions are conditional upon the value of the 1-bit flagregister 16 and when the flag register bit is set at `0`. It alsoprovides the control section 1 with access to a status register 14 whicheffectively forms part of the memory section 2 of the processing unit,and which is accessible also from outside via the master interfacearrangement. The status register 14 provides the means whereby theoperation of the processing unit can be controlled and monitored fromoutside eg by its master processing unit, and contains 6 bits, S1 to S6as follows:

    ______________________________________                                        S1              Activity status                                               ______________________________________                                        0               idle                                                          1               executing                                                     ______________________________________                                        S2              Data distribution status                                      ______________________________________                                        0               disabled                                                      1               enabled                                                       ______________________________________                                        S3-S6      4-bit start address on processor initiation                        ______________________________________                                    

Before initiation of a program, ie while the processing unit is idle andbit S1 reads `0`, the 4-bit PROM address of the first micro-instructionword of the program is written by the master processing unit into bitsS3 to S6 of the status register 14 (the PROM may thus contain up to 16microprograms having start addresses 1 to 16 respectively) and sets bitS1 to `1` to initiate execution of the program. The control logiccircuitry will then respond to the change in the status bit S1 from `0`to `1` to begin execution of the program reading its first PROMinstruction address from bits S3 to S6 into the address register 6. Themaster unit may then poll the status bit S1 until it is cleared to `0`by the control logic circuitry 9 upon execution of a `stop` instructionin the microprogram.

The master may also stop execution of the program at any time byclearing bit S1 to `0`, eg to enable it to access the memory section 3of the processor unit as will be described later.

Bit S2 enables the master to control a distribution mode of transfereither from the master to all slaves which have their S2 bits set at`1`, or from one slave to any of the other slaves which have their S2bits set at `1` as will also be discussed later.

The internal operation of the control section 1 is determined by 17 bits(P1 to P17) of each micro-instruction word as follows. The first bit,P1, determines the interpretation of bits P2 to P17 as either a dataword or as a sequence control instruction.

    ______________________________________                                        P1     Sequence control operation                                             ______________________________________                                        0      Normal sequential operation; P2-P17 is data word.                      1      Sequence control instruction according to P2-P15                       ______________________________________                                    

If P1 in the micro-instruction word is `1`, then P2 to P15 consists of asequence control instruction which controls the processor operation asfollows:

    ______________________________________                                        P2,P3             Type of operation                                           ______________________________________                                        0                 Jump                                                        1                 Call                                                        2                 Return                                                      3                 Stop                                                        ______________________________________                                        P4            Conditionality                                                  ______________________________________                                        0             Unconditional                                                   1             Conditional according to P5                                     ______________________________________                                        P5                Condition                                                   ______________________________________                                        0                 if flag bit F = 0                                           1                 if flag bit F = 1                                           ______________________________________                                        P6-P15     10-bit destination address for jump or call                        ______________________________________                                    

Alternatively, if P1 is `0` in the micro-instruction word, then P2 toP17 consists of a 16-bit data constant word which is latched intoregister 17 to provide a constant for the computation section 3.

The remaining bits of the micro-instruction word, bits P18 to P72 arelatched into a register 20 and used to control various parts of thememory and computation sections of the processor unit via internalcontrol lines (not shown).

The computation section 3 of the processor unit is controlled by bitsP18 to P68 of the micro-instruction word and includes two computationalunits designated CPU-A and CPU-B in FIG. 1. CPU-A provides the maincomputational unit in which most of the mathematical data calculationsare performed, while CPU-B is used only for address calculations andloop counting. Both CPUs are constructed from identical commerciallyavailable 4×4-bit (ie operating on 16-bit words) CPU slices fabricatedusing large scale integration (LSI) microprocessor technology. The CPUsdescribed here and shown in FIG. 2 by way of example are type Am 2901manufactured by Advanced Micro Devices Inc., of Sunnyvale, Calif. USA.The computation section also includes a multiplication unit 22 forperforming multiplications, and a scale accumulation unit 23 whichenables a check to be maintained on the data being transferred into thememory section 2. This gives warning of close-to-overflow conditions,enabling preventive scaling to be done when the data is next accessed. Anumber of registers D,R,Y,Z, and W are also provided for routing databetween various parts of the compution section 1 and the memory section3.

FIG. 2 shows the main computation unit, CPU-A, in greater detail, and itincludes a file 24 of 16 general purpose 16-bit registers and twospecial purpose registers Q and φ the latter of which simply providesthe operand zero. The register file has two outputs a and b throughwhich the contents of any two registers of the file designated a reg andb reg may be read, and one input d through which data may be read intothe register (b reg) associated with output b. The two 4-bit addresseswhich specify these two registers a reg, b reg are contained in themicro-instruction word.

CPU-A also includes an arithmetic/logical unit 25 which takes twooperands x,y, selected by an operand select unit 26 from various sourcesdetermined by bits of the micro-instruction word, and performs on themadd, subtract and logical operations again determined by themicro-instruction word, to produce a function result f. The sources fromwhich the operand select unit 26 may select the two operands x,y are thetwo output registers a reg, b reg of the register file 24, register φ,register Q, and the external register D through which all input dataenters CPU-A.

In addition to the output function f, the arithmetic/logical unit 25also makes available four status bits designated SIGN, CARRY-OUT,OVERFLOW and ZERO for storage in the flag register 16 of the controlsection 1. The SIGN status bit indicates the sign (+ or - in two'scomplement notation) of the output function value f; the CARRY-OUT bitis the carry bit resulting from an addition or subtraction operation;the OVERFLOW bit is the bit produced in the event of an overflowcondition resulting from an arithmetic operation, and the ZERO bitindicates a zero value for the OUTPUT function f.

The arithmetic/logical unit 25 is also associated with a carry-in selectcircuit 27 which is used to select a carry-in bit, eg the CARRY-OUT orOVERFLOW status bit, from the flag register 16 when appropriate for addand subtract operations thus enabling multi-word arithmetic operationsto be performed.

Also included in CPU-A are two shift networks 28,29 which are controlledby a shift control unit 30. The shift network 28 is for performing 1-bitleft or right shifts in the data word (output function f) being enteredinto the selected input register b reg of the register file, and theshift network 29 is for performing 1-bit left or right shifts in thedata word stored in register Q. The shift control unit 30 determineswhich bits are to be shifted in, and makes the bits shifted outavailable for storage in the flag register 16.

The output word from the CPU-A is selected by an output select unit 31from either the output function f of the arithmetic logic unit 25, orthe first output register a reg of the register file 24. The outputfunction f from the arithmetic logic unit 25 may also be written intothe input register b reg of the register file 24, or the register Q, orboth.

The operation of the various parts of CPU-A described above arecontrolled by a total of 20 bits, P18 to P37, from the micro-instructionword. The first 8 of these bits determine the two register fileaddresses as follows:

    ______________________________________                                        P18-P21     First address (areg) in register file                             ______________________________________                                        P22-P25     Second address (breg) in register file                            ______________________________________                                    

The next three bits control the selection of the two operands x and yfor the arithmetic/logical unit 25 in accordance with one of thefollowing eight combinations:

    ______________________________________                                        P26-P28   First operand (x)                                                                          Second operand (y)                                     ______________________________________                                        0         areg         Register Q                                             1         areg         breg                                                   2         Register .0. Register Q                                             3         Register .0. breg                                                   4         Register .0. areg                                                   5         Register D   areg                                                   6         Register D   Register Q                                             7         Register D   Register .0.                                           ______________________________________                                    

The carry-in bit (c) to the arithmetic/logical unit is selected by thenext bit:

    ______________________________________                                        P29                Carry in (c)                                               ______________________________________                                        0                  0                                                          1                  flag bit                                                   ______________________________________                                    

and the next three bits of the micro-instruction word determine which ofthe following eight possible functions is to be performed by thearithmetic/logical unit 25 as follows:

    ______________________________________                                        P30-P32           Function value (f)                                          ______________________________________                                        0                 x + y + c                                                   1                 y - x - c                                                   2                 x - y - c                                                   3                 x or y                                                      4                 x and y                                                     5                 -x and y                                                    6                 x xor y                                                      7                                                                                               ##STR1##                                                   ______________________________________                                    

Here x denotes the bit-complement of x, xor is the exclusive-oroperation, and xor gives the bit-complement result of xor.

The next three bits of the micro-instruction word control the shiftoperations of the two shift networks 28,29 and the destination of theoutput function f (to register b reg or register Q), and also select theoutput word from CPU-A (either f or the contents of register a reg) asfollows (a dash means no charge):

    ______________________________________                                                 Register file                                                                             register Q    Output                                     P33-P35  results (breg)                                                                            result        word                                       ______________________________________                                        0        --          f             f                                          1        --          --            f                                          2        f           --            areg                                       3        f           --            f                                          4        f shifted left                                                                            Q shifted left                                                                              f                                          5        f shifted left                                                                            --            f                                          6        f shifted right                                                                           Q shifted right                                                                             f                                          7        f shifted right                                                                           --            f                                          ______________________________________                                    

As mentioned above, the shift-out bits are made available for storage inthe flag register 16, while the shift-in bit (the same bit for the fshift associated with the input d to register b reg and the register Q)is determined as follows:

    ______________________________________                                               P36,P37                                                                              Shift-in bit                                                    ______________________________________                                               0      0                                                                      1      1                                                                      2      sign of f                                                              3      flag bit                                                        ______________________________________                                    

The sign of f shift-in bit is derived from the SIGN status bit producedby the arithmetic/logical unit 25.

The second computational unit, CPU-B in FIG. 1 is substantiallyindentical to CPU-A shown in FIG. 2, but only a subset of the functionsare used. Its 16 general purpose registers in its register file are allavailable, but its register Q is ignored, as are the shift networks andthe carry-in-select, and the arithmetic/logical unit is restricted toadd and subtract functions only.

CPU-B takes its input from the register Z which in turn takes as itsinput the output word from CPU-A, either directly or via a patch fieldwhich may be wired specially to provide a 16-bit permutation of thedata. Its output word is used for memory addressing in the sameinstruction cycle as the address is calculated.

The operation of CPU-B is controlled by 12 bits of the micro-instructionword, P38 to P49. The first 8 bits provide the two register fileaddresses, i reg, j reg

    ______________________________________                                        P38-P41     First address (i reg) in register file                            ______________________________________                                        P42-P45     Second address (j reg) in register file                           ______________________________________                                    

and the next two bits control the selection of the two operands s, t asfollows:

    ______________________________________                                        P46,P47   First operand (s)                                                                          Second operand (t)                                     ______________________________________                                        0         i reg        j reg                                                  1         Register .0. i reg                                                  2         Register Z   i reg                                                  3         Register Z   i reg                                                                         Register .0.                                           ______________________________________                                    

The next bit P48, determines which of two functions is to be performedby the arithmetic/logical unit on the two operands, and bit P49determines the CPU-B output word (always the output function g of thearithmetic/logical unit) and the destination of the output function g.

    ______________________________________                                        P48        Function value (g)                                                 ______________________________________                                        0          s + t                                                              1          t + s                                                              ______________________________________                                                      Register file                                                                            Output                                               P49           result (j reg)                                                                           word                                                 ______________________________________                                        0             --         g                                                    1             g          g                                                    ______________________________________                                    

The multiplication unit 22 takes two 16-bit signed input words in two'scomplement notation from the registers Y and W respectively, andcalculates a 32-bit signal product. A 16-bit result is selected from thebit product as follow. Two standard selections are provided: the leastsignificant 16 bits (bits 0 to 15 ) and the most significant 16 bits(bits 16 to 31). Two additional selections may be made using a patchfield 32 specially wired as required. The facility is also provided forautomatic rounding of the 16-bit result by adding a `1` in theappropriate bit position (eg bit 15 for the most significant 16 bitsresult).

The selected 16 bit result from the multiplication unit is written intoregister R and may then be transferred in a subsequent instruction cycleto register D for processing in CPU-A.

The operation of the multiplication unit 22 is controlled by 4 bits, P50to P53, of the micro-instruction word as follows:

    ______________________________________                                        P50,P51         Multiply result selection                                     ______________________________________                                        0               Least significant 16 bits                                     1               Patch 1                                                       2               Patch 2                                                       3               Most significant 16 bits                                      ______________________________________                                        P52               Multiply rounding                                           ______________________________________                                        0                 No rounding                                                 1                 Round                                                       ______________________________________                                        P53               Multiply enable                                             ______________________________________                                        0                 No-op                                                       1                 Multiply                                                    ______________________________________                                    

The scale accumulation unit 23 is provided to maintain a check on themost significant bits of data being written from the register Y to thememory section 2. The register Y may receive data either from CPU-A orfrom an internal data bus 33 (shown as a solid bus line in FIG. 1) whichalso provides a source of data from the register D and the scaleaccumulation unit 23. The data on the internal bus may itself comeeither from the register 17, which provides a constant for thecomputation section, or from the memory section 2 of the processor unit.

The purpose of the scale accumulation unit 23 is to warn ofclose-to-overflow conditions enabling preventive scaling to be effectedwhen the data is next accessed. The unit contains two 6-bit maskregisters (not shown) in which two different threshold values derivedfrom the low-order 12 bits of the 16-bit data word on the internal databus 33 taken from the constant register 17, are stored. The thresholdvalue stored in the first mask register corresponds to the high order 6bits (bits 15 to 10) of the data word being transferred from theregister Y to the memory section 2, while that stored in the second maskregister corresponds to bits 14 to 9. The two 6-bit data word values areeach compared with the appropriate 6 bit threshold value to producerespective status bits indicating whether the values of the data wordbits are above or below their threshold values. The two status bits aremade available for storage in the flag register 16 and can be used toindicate whether the data word value lies above the higher thresholdvalue, below the lower threshold value or between the two thresholdvalues.

The unit operates in two modes, one for signed data and one for unsigneddata. In the unsigned mode, the operation for each mask register is asfollows: the 6 bits of the mask are ANDed with the appropriate 6 bits ofthe data word to produce a 6-bit result, and these 6 bits are then ORedtogether and with the corresponding existing status bit to give a newstatus bit. Thus the status bits are updated although the mask registervalues remain unchanged. In the signed mode, the bits of the data arefirst XORed with the sign bit and then operation continues as for theunsigned mode.

The operation mode, signed or unsigned, is determined by aninitialization instruction contained in the micro-instruction word. Atinitialization, the two status bits are cleared and the two maskregisters loaded from the internal data bus 33. Subsequent updating ofthe status bits in the selected mode is effected by an `update enable`instruction from the micro-instruction word. The instructionscontrolling the scale accumulation unit 23 are contained in bits P54,P55 of the micro-instruction word as follows:

    ______________________________________                                        P54,P55        Scale accumulator operation                                    ______________________________________                                        0              No-operation                                                   1              Enable update of status bits                                   2              Initialize for signed mode                                     3              Initialize for unsigned mode                                   ______________________________________                                    

The next 13 bits, P56-P68, of the micro-instruction word control therouting of data and various status bits between various parts of theregister. In particular they control the source and destination of dataand status bits which are written into or read out of a number ofregisters, namely registers D,Y,Z,W and the flag register. They alsocontrol the source and destination of data entered onto the internaldata bus 33. Data may be entered onto the internal bus either from theconstant register 17, the processor unit's internal memory or from anexternal memory via a register I.

The flag register 16, and registers D,Y,Z and W are controlled by clockbits and select bits from the micro-instruction word, the clock bitsdetermining whether a register retains its current value, or has newdata clocked into it and the select bits determine which data isentered.

The micro-instruction word controls the above registers and internaldata bus as follows:

    ______________________________________                                        P56             Y register select                                             ______________________________________                                        0               data bus                                                      1               CPU-A output                                                  ______________________________________                                        P57             Y register clock                                              ______________________________________                                        P58             Z register select                                             ______________________________________                                        0               CPU-A output                                                  1               Patched CPU-A output                                          ______________________________________                                        P59             Z register clock                                              ______________________________________                                        P60             W register clock                                              ______________________________________                                        P61             D register select                                             ______________________________________                                        0               data bus                                                      1               multiply output (R)                                           ______________________________________                                        P62             D register clock                                              ______________________________________                                        P63,P64             data bus loading                                          ______________________________________                                        0                   C register (constant from micro-instruction)              1                   internal memory data                                      2                   I register (from external memory)                         3                   --                                                        ______________________________________                                        P65-P68       Flag register, F, select and clock                              ______________________________________                                        0             CPU-A register shift out                                        1             CPU-A register shift out                                        2             CPU-A Q shift out                                               3             CPU-A Q shift out                                               4             Constant 1                                                      5             Constant 1                                                      6             Constant 0                                                      7             No-op (current value retained)                                  8             CPU-A carry out                                                 9             CPU-A overflow                                                  10            CPU-A sign of function value                                    11            CPU-A zero function value                                       12            CPU-B zero function value                                       13            CPU-B carry out                                                 14            Scale accumulator first mask                                    15            Scale accumulator second mask                                   ______________________________________                                    

The memory section 2 of the processor unit includes an internalread/write memory 35 of 4K words (address range 0 to 4095) for storingvariable data, and a PROM 36 also of 4K words (address range 4096 to8191) for storing data constants and tables, eg sine and cosine values.This section of the processor unit also includes the above-mentionedmaster interface arrangement, comprising a master data busreceiver/transmitter 42 and a master address bus buffer 43, and theslave interface arrangement comprising a slave data busreceiver/transmitter 44 and a slave address bus buffer 45.

When the processor unit controls one or more slave units, its slave databus receiver/transmitter 44 and its slave address bus buffer 45 areconnected respectively by means of a 16-bit slave data bus 46 and a16-bit slave address bus 47, to the master data bus receiver/transmitter42 and the master address bus buffer 43 of each slave. Similarly, whenthe processor unit is itself a slave unit (it may also be a mastercontrolling its own slave units), its master data busreceiver/transmitter 42 and its master address bus buffer 43 areconnected respectively, by means of a 16-bit master data bus 48 and a16-bit master address bus 49, to the slave data bus receiver/transmitter44 and the slave address bus buffer 45 of its master.

The internal memories 35,36 of the processing unit may be addressed viaa latch register 38 by the output word of CPU-B. The output word ofCPU-B may also be used to address the internal read/write memories 35 ofany slave units (or other external memories to which it is connected)via the slave address bus buffer 45 and slave address bus 47. Data readfrom either of the internal memories 35,36 is entered onto the internaldata bus 33. Data being written into the read/write memory 35 (datacannot be written into the PROM 36) derives either from the register Yor from outside, eg from the master processing unit or another externalmemory via the master data bus 48 and master data busreceiver/transmitter 42. Data from either of these sources istransmitted to the internal read/write memory 35 via a multiplexer 39and latch register 40.

The memory address operation is controlled by a single bit, P69, of themicro-instruction word as follows:

    ______________________________________                                        P69            Internal memory operation                                      ______________________________________                                        0              read only                                                      1              read then write                                                ______________________________________                                    

Thus a read operation is always performed, although the data read may ormay not be selected for entry onto the internal data bus 33 according tomicro-instruction word bits P63,P64. When the write operation isenabled, this is done after a read operation so that in a read and writeinstruction, the data read is the old contents of the memory address.

In communications between a master and its slaves, the top 4 bits of the16-bit slave address bus 47 are used to designate a particular slave(there may therefore be up to 16 slaves connected to one master) and theappropriate slave unit recognizes these 4 bits in an address decode unit50 associated with its master address bus buffer 43. The remaining loworder 12 bits of the slave address bus 47 determine the address in theslave's internal read/write memory 35 (the master cannot access aslave's PROM 36). Thus, the internal read/write memories of up to 16slave units appear as a single external address space of up to 64K words(16×4K words read/write memory).

Data may either be written into or read from the selected address in aslave's read/write memory, via the slave data bus 46, but this may onlybe done when the slave is idle, ie when bit S1 of its status register 14is `0`. A slave's status register 14 may however be accessed at any timefor monitor and control purposes. When accessing a slave's statusregister the same 16-bit slave data and slave address buses 46,47 areused, although only the top 4 bits of the slave address bus, designatinga particular slave, and the low order 6 bits of the slave data bus(corresponding to the 6 bits of the status register) are relevant.

A distribute function is provided for writing the same data unit intoseveral slave's read/write memories simultaneously (at the same addressin each). When the distribute function is specified by themicro-instruction word of the master processor unit (discussed below)all slaves having their status register S2 bits set at `1` will acceptdata regardless of the top 4 bits on the master's slave address bus 47(which is connected to the master address buses 49 of its slaves).

Control of the master/slave communications is determined by bits P70 toP72 of the micro-instruction word as follows:

    ______________________________________                                        P70-P72 communications functions                                              ______________________________________                                        0       write to slave's status register                                      1       read from slave's status register                                     2       write to slave's memory                                               3       read from slave's memory                                              4       no-op                                                                 5       no-op                                                                 6       distribute data to several slaves' memories                           7       read data from one slave and distribute to several                    ______________________________________                                    

Each of these communications functions actually takes two operatingcycles to complete.

In a master-to-slave write operation, the data, taken from the master'sregister Y, and the address, taken from CPU-B, are sent in cycle 1 tothe slave across the master's slave data bus 46 and slave address bus 47respectively, and enter the slave via the slave's master data busreceiver/transmitter 42 and master address bus buffer 43. In cycle 2 theslave uses the data and address to perform the write into its read/writememory 35 or status register 14 as appropriate.

In a master/slave read operation, the address from CPU-B of the masteris sent to the slave's master address bus buffer 43 across the master'sslave address bus 47 in cycle 1. In cycle 2 the slave uses the addressto read the data from its internal read/write memory 35, enters it onthe internal data bus 33 and transfers it to the master via its masterdata bus receiver/transmitter 42 and master data bus 48. The dataarrives at the master's register I, via its slave data busreceiver/transmitter 44, early enough in cycle 2 to be selected forentry onto the master's internal data bus 33 according to bits P63,P64of its micro-instruction word.

A slave's status register 14 may similarly be read by a master using thesame data and address buses.

It will be seen that there is a potential clash in use of themaster-to-slave data bus 46 when a read instruction to a slave isimmediately followed by a write instruction. This clash is resolved bygiving the outgoing write data precedence over the incoming read datafrom the slave. Thus, the read data is lost, resulting in the master'sregister I containing the write data which is then entered onto theinternal data bus as for a normal read operation.

Since the internal and external (communications) memory functions can becarried out simultaneously, it is possible for block transfers of data(ie groups of data words) between a master and slave(s) to be performedat the rate of one data word per instruction cycle.

For block transfer of data from master to slave(s) a data word (N) isread from an address (u) in the read/write memory 35 of the masterspecified by the output word of its CPU-B, and routed to its register Yin the first cycle. In the second cycle a new address (u+1) is generatedby the master's CPU-B and used to read out another data word (N+1) fromthe master's memory 35 which is transferred to the master's register Yas before. Also during the second cycle, the data word N from theregister Y is transmitted to the slave(s) together with the new addressword (u+1) and is written into the read/write memory 35 of the slave(s)at an address word (u+1). Thus, the destination in the slave(s) isoffset by one from the master's source address.

For block transfer of data from slave to master, the slave's address issent to the slave in one cycle, the data from that address returns tothe master's register I then Y in the next cycle, and is written intothe master's memory in the third cycle. Again, as these 3 cycleoperations overlap, the net result is that the destination address inthe master is offset by two from slave's source address.

Block transfer of data from one slave to other slaves may also beeffected using the read and distribution functions. Data is read from aread/write address in the slave specified by the top 4-bits on itsmaster address bus 49 in one cycle, and written into the same address inthe read/write memories of all slaves which have their status register52-bits set at `1` in the following cycle. The destination address isoffset by one from the source address.

Thus each processing unit is responsive on receipt of a single startinginstruction entered into its status register 14 to perform the executionof a stored microprogram selected by the starting instruction. Theactivity status, idle or executing can be continuously monitored fromoutside to ascertain when the unit has completed execution of theprogram, following which results of its computations can be recoveredfrom it. While executing the program, the processing unit may controland monitor, and access and transfer data between the internalread/write memories 35 of, its slave processing units. Thus, aprocessing unit, in execution of a program, may delegate functions toeach of its slaves by transferring data to their read/write memories,initiating their program execution on that data, and then recovering theresults of those computations when completed.

A typical multiple-processor system configuration incorporating aplurality of substantially identical processing units in accordance withthe invention is shown in FIG. 3, illustrating the manner in which theprocessing units can be interconnected to perform complex signalprocessing problems requiring computing power greater than that whichcan be achieved using a single unit.

The configuration starts at the top of the tree with a first processingunit PU1 whose master interface arrangement is connected by means of adata and address bus set to a host minicomputer (not shown) whichcontrols the overall operation of the system via the top unit PU1. Thehost mini-computer need access only the status register 14 of PU1 inorder to initiate and monitor its operation, although in practice it mayalso provide input and output of data to and from the read/write memoryof PU1.

The slave interface arrangement of PU1 is in turn connected by a dataand address bus network to the master interface arrangements of each offour slave processing units PU2 to PU5, and to an input/output (I/O)interface unit 60. When in its execution mode, PU1 may communicate withthe I/O interface unit to load data into its own internal read/writememory 35, transfer data between the I/O interface unit and any one orall of its slave units PU2 to PU5, transfer data from its own read/writememory to that of any one or all of its slaves, or transfer data betweenany combination of its slaves via the data and address bus network.Writing of data into or reading of data from any of the slaves can onlybe done while the particular slave or slaves concerned are idle. Thesame data and address bus network is used by PU1 to control and monitorthe operation of the slave units PU2 to PU5 by addressing their statusregisters.

In the same manner as PU1 is connected to slaves PU2 to PU5, and to theI/O interface unit 60, the slave unit PU5 is itself connected to an I/Ointerface unit 61, and its own set of three slaves PU6 to PU8, and thesame considerations regarding control, monitoring andinter-communications apply. The third of these slaves PU8, has its slaveinterface unit connected to an external memory 63 effectively providingthe system with an increased memory workspace. The size of this memorymay be as large as the 16-bit address word capability of its slaveinterface arrangement will allow, ie 64K words.

While it has been found convenient to use a separate memory device, iethe status register 14, to provide the means whereby the processing unitcan be controlled and monitored from outside, this memory space mayalternatively be provided by a part of the externally accessibleread/write memory area provided in the example by the read/write memory35. This part of the internal read/write memory may then be accessedfrom outside via the master interface arrangement, for the purpose ofwriting control instructions to, and for monitoring the status of, theprocessing unit. In such an arrangement, the control section will alsobe able to access this part of the internal read/write memory for thepurpose of reading control instructions previously written into it fromoutside, and writing into it data relating to the operating status ofthe processing unit for monitoring purposes.

Furthermore, communications to and between processing units for thesecontrol and monitor purposes may be effected by a separate control busarrangement. Each processing unit will then have a separate interfacearrangement solely for use in transmitting control instructions to, andreceiving data relating to the operating status of, any slave processingunits to which it is connected; and for receiving control instructionsfrom and transmitting data relating to its operating status to, itscontroller, eg its master processing unit.

I claim:
 1. A data processing unit comprising:a control sectionresponsive to a control instruction to switch the processing unit froman idle mode to an execution mode, the control section being operativeduring the execution mode to initiate and control the sequence ofexecution of a program stored therein and to automatically switch theprocessing unit to the idle mode in which the control section is nolonger operative to initiate and control the sequence of execution of aprogram stored therein, a memory section including an area of read/writememory, a computation section for performing calculations in accordancewith instructions from the program and capable of accessing theread/write memory area during execution of the program, interface meanshaving access to the area of read/write memory through which theprocessing unit is able to access an external memory when in itsexecution mode, through which at least part of the read/write memoryarea of the processing unit is able to be accessed from outside when inits idle mode and through which said control instruction for switchingthe processing unit from its idle to its execution mode is able to bereceived from outside; a predetermined part of the read/write memoryarea of the memory section which is accessible from outside via theinterface means being adapted to receive said control instruction, andthe control section including means for responding to the controlinstruction in the predetermined part of the read/write memory area, theinterface means comprising a master interface means including a masteraddress bus through which the processing unit is able to receive addresswords designating locations in its read/write memory and a master databus through which the processing unit is able to receive or transmitdata to be written into or read from the memory locations designated bysaid address words, through which a predetermined part of the externallyaccessible read/write memory area is able to be accessed from outsidefor control and monitor purposes when the processing unit is in both itsidle and execution modes, and through which the remaining part of theexternally accessible read/write memory area is able to be accessed fromoutside only when the processing unit is in its idle mode, and slaveinterface means including a slave address bus through which theprocessing unit is able to transmit memory addresses to an externalmemory area and a slave data bus through which data to be written intoor read from the external memory is able to be transmitted or received,through which the processing unit is connectable to access the externalmemory area only when in its execution mode.
 2. A data processing unitas claimed in claim 1 wherein said predetermined part of the externallyaccessible read/write memory area is adapted to be accessed both fromoutside via the master interface means and by the control section duringboth the idle and the execution modes of the processing unit, and thecontrol section includes means for writing into it data indicating achange in the current activity status, idle or executing, of theprocessing unit, and for reading from it control instruction dataindicating a required change in the activity status, idle or executing,of the processing unit written into said predetermined part of theread/write memory area from outside via the master interface means,whereby to enable the activity status of the processing unit to bemonitored and controlled from outside via the interface means.
 3. Aprocessing unit as claimed in claim 2 wherein the activity status of theprocessing unit is indicated by a one-bit binary work location in saidpredetermined part of the read/write memory area, one value of whichindicates the idle mode and the other value of which indicates theexecution mode of the processing unit.
 4. A processing unit as claimedin claim 2 or claim 3 wherein the control section is capable ofcontrolling the execution of either one or one of a plurality ofprograms the sequential instructions of which are stored in programmableread only memory (PROM), each program having a different startingaddress within the PROM, the predetermined part of externally accessibleread/write memory area including a memory location into which controlinstruction data indicating the starting address of one of said programscan be written from outside via the master interface means as part ofsaid control instruction, and from which this data can be read by thecontrol section for program selection.
 5. A data processing unit asclaimed in claim 1, 2 or 3 wherein said predetermined part of theexternally accessible read/write memory area is provided by a separatememory device.
 6. A data processing unit as claimed in claim 1 whereinthe external memory address capacity of the unit via the slave interfacemeans is substantially greater than the internal read/write memory areaaccessible from outside via the master interface means.
 7. A dataprocessing system comprising a plurality of data processing units eachas claimed in claim 1 wherein the slave interface means of a first ofsaid processing units is connected to the master interface means of eachof one or more other processing units, whereby the first processingunit, when in its execution mode, is able to access the externallyaccessible internal read/write memory area of the or each of the otherprocessing units which effectively constitute an external memory areafor the first processing unit.
 8. A data processing system as claimed inclaim 7 wherein the slave interface means of at least one of said otherprocessing units is connected to the master interface means of at leastone further processing unit individually associated with it, whereby theexternally accessible internal read/write memory area of the or each ofsaid other processing units effectively constitutes an external memoryarea accessible to the said other processing unit to which it isconnected.
 9. A data processing system as claimed in claim 7 or 8wherein the externally accessible internal read/write memory areas ofprocessing units which have their master interface means connected tothe slave interface means of a common processing unit each constitute aseparately identifiable section of the external memory address capacityof the common processing unit to which they are connected.
 10. A dataprocessing system as claimed in claims 7 or 8 wherein theinterconnection between any processing units which have their masterinterface means connected to the slave interface means of a commonprocessing unit comprises a commmon address bus through which the commonprocessing unit is able to transmit address words designating memorylocations in the externally accessible read/write memory areas of theprocessing units to which its slave interface means is connected, andthe master interface means of each of said processing units includesmeans for decoding said address words to determine whether thedesignated memory location lies within its internal read/write memoryarea, the interconnection further comprising a common data bus throughwhich the common processing unit is able to write data into or read datafrom the memory locations designated by the address words.